MEWSE: multi-engine workflow submission and execution on apache YARN

نویسندگان

  • Kiran Sundaravarathan
  • Patrick Martin
  • Dan Rope
  • Mike McRoberts
  • Craig Statchuk
چکیده

In this era of BigData, designing a workflow to gain insights from the vast amount of data has become more complex.There are several different frameworks which individually process the batch and streaming data but coordinating the jobs between the engines in the workflow creates a performance penalty and other performance issues. Current workflow systems typically run only on one engine and do not offer the versatility required for today’s workflows. The process of submitting the jobs on different engines manually is not only time consuming, but also requires the expertise of working on these engines. In this thesis, we have overcome the above mentioned issues by proposing a MEWSE Multi Engine Workflow Submission and Execution on Apache YARN. It should also have design with plug and play functionalities to allow the inclusion of new engines. MEWSE has been tested on Amazon EC2 with a sample workflow which requires the following engines, Hadoop, Mahout, java and some scripts to process the data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SAASFEE: Scalable Scientific Workflow Execution Engine

Across many fields of science, primary data sets like sensor read-outs, time series, and genomic sequences are analyzed by complex chains of specialized tools and scripts exchanging intermediate results in domain-specific file formats. Scientific workflow management systems (SWfMSs) support the development and execution of these tool chains by providing workflow specification languages, graphic...

متن کامل

Hi-WAY: Execution of Scientific Workflows on Hadoop YARN

Scientific workflows provide a means to model, execute, and exchange the increasingly complex analysis pipelines necessary for today’s data-driven science. However, existing scientific workflow management systems (SWfMSs) are often limited to a single workflow language and lack adequate support for large-scale data analysis. On the other hand, current distributed dataflow systems are based on a...

متن کامل

Architectural Plan for Constructing Fault Tolerable Workflow Engines Based on Grid Service

In this paper the design and implementation of fault tolerable architecture for scientific workflow engines is presented. The engines are assumed to be implemented as composite web services. Current architectures for workflow engines do not make any considerations for substituting faulty web services with correct ones at run time. The difficulty is to rollback the execution state of the workflo...

متن کامل

Architectural Plan for Constructing Fault Tolerable Workflow Engines Based on Grid Service

In this paper the design and implementation of fault tolerable architecture for scientific workflow engines is presented. The engines are assumed to be implemented as composite web services. Current architectures for workflow engines do not make any considerations for substituting faulty web services with correct ones at run time. The difficulty is to rollback the execution state of the workflo...

متن کامل

BiobankCloud: A Platform for the Secure Storage, Sharing, and Processing of Large Biomedical Data Sets

Biobanks store and catalog human biological material that is increasingly being digitized using next-generation sequencing (NGS). There is, however, a computational bottleneck, as existing software systems are not scalable and secure enough to store and process the incoming wave of genomic data from NGS machines. In the BiobankCloud project, we are building a Hadoop-based platform for the secur...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016